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Abstract. Many studies record replicated time series epochs from different groups with the goal 
of using frequency domain properties to discriminate between the groups. In many applications, 
there exists variation in cyclical patterns from time series in the same group. Although a number 
of frequency domain methods for the discriminant analysis of time series have been explored, there 
is a dearth of models and methods that account for within-group spectral variability. This article 
proposes a model for groups of time series in which transfer functions are modeled as stochastic 
variables that can account for both between-group and within-group differences in spectra that 
are identified from individual replicates. An ensuing discriminant analysis of stochastic cepstra 
under this model is developed to obtain parsimonious measures of relative power that optimally 
separate groups in the presence of within-group spectral variability. The approach possess favorable 
properties in classifying new observations and can be consistently estimated through a simple 
discriminant analysis of a finite number of estimated cepstral coefficients. Benefits in accounting 
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for within-group spectral variability are empirically illustrated in a simulation study and through 
an analysis of gait variability. 

Keywords. Cepstral Analysis. Fisher’s Discriminant Analysis. Replicated Time Series. Spec¬ 
tral Analysis. 
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1 Introduction 


Discriminant analysis of time series is important in a variety of fields including physics, geology, 
acoustics, economics, and medicine. In applications where scientifically meaningful information is 
contained within power spectra, spectral domain approaches are desired. A number of spectral 
based methods for the discriminant and classification analysis of time series have been developed. 


These methods, a review of which can be found in Chapter 7.7 of Shumway and Stoffer (2011), in¬ 


clude Shumway and Unger (1974), Dargahi-Noubary and haycock (1981), Shumway (1982), Alagon 


(1989), Zhang and Taniguchi (1994), and Kakizawa et al. (1998). 

The aforementioned methods are based on models that assume the existence of group-common 
power spectra that can be consistently estimated from any single time series replicate from within 
a group. In many applications, this assumption does not hold as there exist obvious differences in 
second-order spectra that can be identified by individual time series within the same group. As an 
example, consider stride interval series, or the time taken to complete consecutive gait cycles, that 
were collected as part of a study to better understand connections between walking patterns and 
neurological disease (Hausdorff et ah, 2000). Figure displays three examples of stride interval 
series from study participants under each of three neurological conditions and Figure displays 
estimated log-spectra for each stride interval series in the study. Aside from neurological conditions, 
walking is influenced by numerous person specific physiological characteristics. Consequently, there 
exist differences in cyclical patterns of stride interval series from different subjects with a common 
neurological condition. 

The presence of extra spectral variability in replicated time series and the inability of traditional 
time series models and methods to account for it was hrst discussed in the literature by |Diggle and 


Al Wasel (1997). They introduced a parametric log-spectral mixed-effects model for replicated time 


series, which was later nonparametrically generalized by Saavedra et al. (2000, 2008), lannaccone 
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Figure 1: Detrended stride interval series from 9 participants in the gait analysis study: 3 healthy 
controls, 3 participants with amyotrophic lateral sclerosis (ALS), and 3 participants with Hunting¬ 
ton’s disease. 


and Coles (2001), Freyermuth et al. (2010) and Krafty et al. (2011). Although these models allow 


one to conduct inference on a group average spectrum, the question of how within-group spectral 
variability affects discriminant analysis has yet to be addressed. To address this question, this 
article introduces a stochastic transfer function model for groups of time series in the presence of 
within-group spectral variability that makes explicit the replicate-specific spectra that are estimable 
from individual series. The model accounts for the higher-order long-range dependence that is 
responsible for within-group spectra variability. 

The Fisher’s discriminant analysis of stochastic cepstra, or inverse Fourier transforms of log- 
spectra, is explored as a means of discriminating and classifying time series under the stochastic 
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Figure 2: Estimated replicate specific log-spectra for the detrended stride interval series from each 
participant in the gait analysis study. 


transfer function model. The procedure possess optimal properties that mirror those of Fisher- 
type discriminants in other high-dimensional settings. When extra spectral variability exists, the 
cepstral Fisher’s discriminant analysis more accurately classihes a new time series of unknown group 
membership compared to existing spectral methods that ignore within-group spectral variability. 
Other Fisher’s type approaches can be developed under the stochastic transfer function model, 
such as through integral functions of log-spectra. We use the cepstral formulation as it provides 
both a more encompassing theoretical framework and an intuitive approach to estimation that 
simultaneously overcomes the inconsistency of periodograms and the high-dimensionality of the 
data. 


As discussed in Chapter 11 of Johnson and Wichern (2007), although intertwined, there is a dis¬ 


tinction between discriminant analysis, which seeks parsimonious measures that illuminate group 
separation, and classification analysis, or the prediction of group membership of new observations. 
The aforementioned existing methods for spectral discrimination address the problem of classifica- 
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tion, albeit with a higher error than the proposed procedure when within-group variability exists, 


but are black-box in nature and do little in helping to understand the way in which groups are 
separated. The proposed procedure address the discrimination problem by producing discriminants 
and weight functions that provide low-dimensional interpretable measures of relative power that 
illustrate scientific mechanisms that separate groups. 

The rest of this article is organized as follows. The stochastic transfer function model for groups 
of replicated time series with within-group spectral variability is presented in Section In Section 
we discuss the cepstral Fisher’s procedure for discriminant and classification analysis. Section 
introduces an intuitive estimation procedure via a finite number of estimated cepstral coefficients. 
Simulation studies are explored in Section to investigate empirical properties of the proposed 
procedure and to compare these properties to those of existing methods. The method is used in 
Section to analyze data from the motivating study of gait variability. Concluding remarks are 
given in Section Proofs are relegated to the appendix. 

2 Model 

2.1 Stochastic Transfer Function Modei 

Consider a population of real-valued stationary time series composed of j = 1,..., J groups, lij 
represents the jth group, and tTj is the proportion of time series from the population belonging 
to Ilj. To present a model that accounts for both between and within-group spectral variability, 
we consider a model with stochastic replicate-specific transfer functions. Replicate-specific transfer 
functions Ajk are defined as independent second-order random variables identically distributed for 
each replicate k within group j and whose realizations are complex-valued random functions that 
are Hermitian, have period 1, and both real and imaginary parts are uniformly continuous. The 
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group IIj is modeled as the collection of time series {Xjkt',t G Z} where 


Xjkt= / 
Jo 


( 2 . 1 ) 


Zjk are independent and identically distributed orthogonal processes that are independent of the 
transfer functions and E|dZjfc(A)|^ = 1. Mixing conditions are assumed on Zjk such that Zjkt = 
is, in some sense, short-range dependent. Many different mixing conditions can be 


used. In this article we use the conditions in Assumption 2.6.1 of Brillinger (2001) such that Zjkt 
is strictly stationary, cumulants of all orders exits and are absolutely summable. 

Each time series in Yij is an independent and identically distributed stationary process with 
Cramer representation W(A) = and spectrum E|Ajfc(A)p. When 

Var ||Ajfc(A)p| = 0 for all A, Xjkt is short-range dependent. However, when the variance of 
replicate-specific spectra is non-trivial, it is not. Conditional on the replicate-specific random 
transfer function Ajk, Xjkt is stationary, short-range dependent, and has replicate-specific power 
spectrum \Ajk{A)\‘^. The stochastic transfer function model is a reparameterization of the Cramer 
representation that makes explicit the unit-specific spectra that are estimable from individual repli¬ 
cates. Before comparing the stochastic transfer function model to other models, consider the fol¬ 
lowing example. 

Example: Conditional MA(1). Let lij be the collection of conditional invertible Gaussian 
MA(1) processes with nonnegative autocorrelation such that Xjkt = ^jkt + djk^jkt-i where Ojk are 


iid 


uniformly distributed over [0,1] and are independent of ejkt ~ A(0, These time series have a 


stochastic transfer function representation of the form (2.1) where Ajk{A) = a (l -|- 9jke 


Zjk is complex Brownian motion with Cov {Zjk{A), Zjk{u})} = min(A,a;) and Zjk{—A) = Zjk{A). 
Conditional on 6jk, Xjkt is a Gaussian MA(1) process with replicate-specific spectrum \Ajk{A)\^ = 
cr^ |l -|- + 20jfc cos(27rA)|. As a Gaussian MA(1) process, the conditional autocovariance and 

all conditional higher-order moments vanish for lags greater than 1. Consequently, standard results 
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hold so that the periodogram from a replicate at Fourier frequencies are approximately independent 
and can be smoothed to obtain a consistent estimate of the replicate-specific spectrum. Marginally, 
Xjkt is stationary, Cov {Xjkt+h, Xjkt) = 0 when \h\ > 1, and it has a marginal power spectrum of 
'E\Ajk{X)\‘^ = {4/3-|-cos(27rA)}. However, higher order cumulants do not decay. For instance, 

Cov Xjf,^ = 4ct'^/ 45 for all |h| > 1. Consequently, periodograms are correlated and cannot 

be smoothed to obtain a consistent estimate of the marginal spectrum. 

The existing models for spectral based discriminant analysis previously cited are based on 
the assumption that each time series is short-range dependent so that periodograms at different 
Fourier frequencies are approximately independent, and that periodograms from each realization 
can be smoothed to obtain a consistent estimate of a group-common spectrum. Such assumptions 
are required in the spectrum analysis of a single time series, as they are essential in obtaining a 
consistent estimate. However, they are not appropriate for the analysis of replicated time series 
when there exists variability in the second-order spectral structures from different realizations. The 
stochastic transfer function model generalizes the models considered by these existing methods to 
account for the marginal higher-order long-range dependence that is responsible for within-group 
spectral variability. It should be noted that stochastic transfer functions were previously used in 


the semiparametric mixed-effects regression model of Krafty et al. (2011), where transfer functions 
are decomposed into deterministic and stochastic components to allow for the regression analysis 
of time series when multiple correlated replicates are observed from different subjects. Although 
a parameterization of this model using dummy variables can be formulated to define a model of 


groups of independent time series with extra spectral variability, it is not identifiable. 





2.2 Stochastic Cepstra and Log-Spectra 


Our goal is to find interpretable measures that best separate groups and focus on those that are 
measures of relative power, or are linear in log-spectra. To find such measures, we will utilize 
cepstral coefficients, or inverse Fourier transforms of log-spectra. They will provide both a flexible 
and rigorous theoretical framework and guide estimation. Replicate-specific log-spectra are defined 
as 7 jfc(A) = log |^jfc(A)|^, the jth group mean log-spectrum is defined as aj{X) = E{ 7 jfc(A)}, and 
we let /3jfc(A) = 7 j 7 c(A) — aj(A) be the replicate-specific deviation of the log-spectrum of the A:th 


replicate from the jth group. The original formulation of the cepstrum by Bogert et al. (1963) 
considered coefficients in terms of complex trigonometric polynomials. Since we are considering 


real-valued time series, log-spectra are even functions and, in a manner similar to Bloomfield 


(1973), we define the cepstrum through a cosine series. Define the replicate-specific cepstrum as 
Cjk = {cjko,Cjki ,...) G where is the set of real valued sequences indexed by the natural 
numbers N = {0,1,... }, such that 

Cjko= / 7 ifc(A)dA, Cjk£= / 7jfc(A)\/2cos(27rA£)dA, £=1,2,.... 

Jo Jo 

Group-average and replicate-specific deviation cepstra aj,bjk € are similarly defined as the 
cosine series of aj and fJjk, respectively. 

Example: Conditional 1V[A(1), continued. For the collection of conditional MA(1) processes 


considered in Section 2.1, replicate specific log-spectra are 


7ifc(A) = log + 9% + 29jk cos(27rA)}] = log(cr^) + 2 ^ j cos(27r£A), 

i=i 

so that Cjko = log(cr^), Cjki = (—l)^^^\/20j^/£, £ > 1. Recalling that 9jk is uniformly dis¬ 
tributed over the unit interval, the group-average cepstra can be found to be ajko = log (cr^), 
o-jke = (~l)^^^\/2/{£(£-)-1)}, £ > 1. The zero-order cepstral coefficient refiects the conditional 
innovation variance. In our example, replicates have a common conditional innovation variance, so 
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that Var {cjko) = 0. Within-group variability in smooth replicate-specific spectra is reflected in the 
variability of positive-order cepstral coefficients, with 

2(_l)^-rm+2 

Cov {cjk£, Cjkm) TT” , -. \ / « , 7w , 771 > 1. 

3 Discriminant and Ciassification Anaiysis 

3.1 Cepstral Discriminant Analysis 

The cepstral Fisher’s discriminant analysis seeks successive uncorrelated one-dimensional linear 
functions of replicate-specific cepstra that provide maximum separation between group-means 
relative to within-group variability. Before dehning the procedure, we hrst define some nota¬ 
tion. Let a£ = J2j=i be the overall mean cepstrum and define the between-group kernel as 
A(t', m) = J2j=i ~ {o-jm — flm)- Separation between group-means of linear functions of 

cepstra Vot^jke with weights t/q £ is dehned as the weighed sum of the squared distances 
of each group-mean to the overall mean, or ||yo|lA ~ YlTm=oyoi^{'^^^)yom- Defining the within- 
group kernel r(.^, m) = E {bjk£bjkm), the covariance between two linear combinations of cepstra 
with weights yo,yi G can be written as {yo,yi)r = yorr(£, m)yim and the variance of a 

linear combination with weights yo is given by ||yo|lr = ( 2 / 0 )?/o)r- 

Discriminants are dehned sequentially. First discriminants djki are linear functions of cepstra 
dehned by weights yi G such that group means are maximally separated relative to within-group 
variability such that 

OO 

= X] = argmax ||7/||^ . 

£=0 Ihllr^l 

Higher order discriminants maximize group-mean separation among linear combinations orthogonal 
to lower-order discriminants such that gth-order discriminants djkq with weights yq G are dehned 
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as 


djkq = '^yqiCjH, yq= argmax ||y||^. 

£=0 llyllr = l> {y,ym)r=0,m<q 

The number of non-trivial discriminants Q is less than or equal to the ranks of T and A, which is less 
than or equal to J — 1. The discriminants djkq provide low-dimensional measures that best separate 
the J groups. These parsimonious measures can be used to visualize the high-dimensional data, 
such as Figure 5 in the analysis of gait variability in Section and provide an intuitive a powerful 
classification procedure, which is discussed in Section 3.3. The weights yq provide information as 
to how the discriminants can be interpreted as measures of relative power. 

The cepstral Fisher’s discriminant analysis is formulated above when the covariance functions 
of bjk are the same for each group. If there is heterogeneity, all statements made in this article aside 
from those concerning optimal classification rates still apply using the pooled within-group covari¬ 
ance Ylj=i'^j^i^jkebjkm) in lieu of r{£,m). A more detailed discussion of this issue is presented 
in Section]^ A consequence of Proposition 2.1 from Shin (2008) is that cepstral discriminants are 
well defined when cepstral group means are not dissimilar to replicate-specific deviations in the 
sense that Uj is contained in the reproducing kernel Hilbert space with reproducing kernel F for 
j = 1,... ,J. We assume that this regularity condition is satisfied so that discriminants always 
exist. 


3.2 Log-Spectral Weight Functions and Discriminants 

Measures of relative power could have alternatively been defined using integral functions of log- 
spectra. However, due to the unbounded nature of the inverse of a non-singular continuous co- 
variance operator, optimal discriminants will not necessarily exist unless additional strong and 


nonassessable assumptions are made. This issue is discussed by Shin (2008) and by Delaigle and Hall 


(2012). When = VqO + Y^'^i yqiV^cos{2TrXi) exists, djkq = fg ^q(A}jjk(X}dA, 
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and we refer to as a log-spectral weight function. The cepstral based formulation is broader and 
encompass the integral log-spectral formulation in the sense that, if a discriminant exists in the 
integral log-spectral formulation, it is equivalent to djkq and has weight function ^q. 


3.3 Classification Analysis 


Consider a new time series of unknown group membership G Z} with cepstrum c* and 

Q'th cepstral discriminant (i*q. The property that (i*g, q = are uncorrelated with unit 

variance suggests classifying the new time series into htj when the jth group-mean discriminants 
fijq = closer to (d*!,..., d^o)^ than the other J— 1 group means. Formally, if we let 

n(c,) G {1,..., J} index the population to which the new time series is classified, the classification 
rule is 


Q 


n(c*) = argmin I ^ (d*q - ^jqf - 21 og( 7 rj) 




(3.2) 


This classification rule is the optimal centroid classiher of Delaigle and Hall (2012) for replicate- 


specihc log-spectra and properties concerning its classification rate are established in their Theorems 


1 and 2 when T is non-singular and J = 2. If the processes generating 7 ^-^ are Gaussian, then (3.2) 
is optimal in that it has smallest classification error among all spectrum based classihcation rules, 
and this error is bounded from zero. If 7 ^^ is not Gaussian, then the classihcation rule with smallest 
error will depend on the distribution of 7 ^^, and this error is bounded from zero. Although a non¬ 
linear classiher will exist with smaller classihcation error, one must know the distribution of 7 ^-^ to 
hnd this rule and, while it presents some theoretical improvements compared to the Fisher’s cepstral 
procedure for the classihcation problem, it will not provide parsimonious measures to address the 
discrimination problem. 

When within-group spectral variability is not present, asymptotically perfect classihcation can 


be achieved (Zhang and Taniguchi, 1994; Kakizawa et al., 1998). However, when within-group spec- 
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tral variability is present, asymptotically perfect classification is not possible and methods that can 
achieve asymptotically perfect classification in the absence of within-group spectral variability have 
a bounded non-zero error rate. As a time series from one group could possess a replicate-specific 
spectrum that more closely resembles replicate-specific spectra from another group by chance alone, 
this is rather intuitive and illustrates the increased difficulty of classification in the presence of 


within-group spectral variability. It should be noted that Delaigle and Hall (2012) describe a sit¬ 


uation where asymptotically perfect classification is possible for functional data. However, the 
assumption that aj is contained in the reproducing kernel Hilbert space with reproducing ker¬ 
nel r, which is known to be necessary in avoiding degenerate time series models (Parzen, 1961| 


1962), makes asymptotically perfect classification not possible under the stochastic transfer function 


model. 


4 Estimation 

4.1 Cepstral Coefficients 

Consider estimation from n = independent time series epochs of length N, ..., Xjk]\f}, 

j = 1,..., J, k = 1,... ,nj. When replicate-specific log-spectra are smooth, most information is 
contained in lower-order cepstral coefficients. This is illustrated in the conditional MA(1) exam- 

where cjki = Op . Consequently, discriminants and weight 


pie from Sections 


2.1 


and 


2.2 


functions can be estimated through a classical Fisher’s discriminant analysis of a finite number of 
estimated cepstral coefficients. 

We consider the class of replicate-specific cepstral estimators of the form 
N-l N-1 


CjkO X ^ ^ 'yjkmi ^jkl X ^ ^ ^ jkm''/^ (^TrAmf) , £ 1, . . . , [A1/2J , 

m=0 m=0 

where ^jkm is an estimator of 'Yjki^m), = m/N, m G Z. There is an extensive literature on 
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spectrum estimation. Since, for fixed the variance of Cjk^ — Cjkt decays as N increases, not only 
are standard consistent estimators appropriate, such as smoothed periodograms and multitaper es¬ 
timates, but also inconsistent estimators, such as periodograms. Although the consistency results 


for discriminants and weight functions established in Section 4.4 hold for any standard estimator, 


good finite sample performance is contingent on log-spectral estimators having small bias, small 
variance and, since the within-group covariance F must also be estimated, errors that are approxi¬ 
mately uncorrelated across frequency. We advocate the use of the multitaper estimators developed 


by Thomson (1982) that are shown in Percival and Walden (1993) to have small bias, as opposed 


to periodogram based methods, small variance, as opposed to inconsistent estimators, and high 
resolution, as opposed to methods that smooth across frequency. 

Let {hrt, t = 1,..., N} he r = 1,. .., R non-negative orthonormal data tapers such that ^rthst 
1 {r = s}, r,s = 1, ... ,R, where 1 {•} is the indicator function. This article will consider the sine 
tapers 


hrt — 


1/2 


sin I vrt 


^N + lJ V A^ + 1, 

The rth direct spectral estimator is defined as the tapered periodogram under the rth data taper, 


hjkrm 


iV-V 2 hrtX^kte 




, and the multitaper log-spectral estimator is defined as 
/ R \ 


hjkm — log I R ^ ^ Ijkrm 

V r=l / 


(4.3) 


4.2 Discriminants, Weight Functions, and Ciassification 


We estimate discriminants and weight functions through a classical Fisher’s discriminant analysis 
of cj/j = (cjfcO) • ■ •) CjkL-if" for some L smaller than N and n. The data driven selection of L is dis¬ 


cussed in Section 


4.3 


To define this estimator, let itj = rij/n, ah = n • Y2k=i ^jk^ ^ ~ X)}=i > 


and Al = ~ ^ I - a ) . Further, let = ch-aj and f 7, = Lj where 


i'Lj = (%■ ~ 1) ^ ^fk i^fk) ■ estimated gth cepstral weight function is defined 


as a 
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Q'th ordered eigenvector of and an estimated discriminant is defined as = {yq) 

Note that the estimated log-spectral weight function iq{X) = y^Q + Yli=i Vqi^ 
ists and is interpretable even if ^q itself only exists in a limiting sense. The classification rule 
for a new time series with estimated L-dimensional cepstrum can be estimated as n(c^) = 


ex- 


argmiiij 


2-yg=l 




yL 
y q 


- 2 log (tTj) 

It should be noted that Cjki ~ Cjki for all £ = 0,..., L — 1 when a reasonable spectral estimator 
is chosen and when L is small relative to N. Consequently, in practice, the procedure is similar to a 
classical Fisher’s discriminant analysis on {cjko, • ■ ■, CjkL-i)^, and statistical properties of estimators 


follow from classical results for estimated Fisher’s discriminant analysis (Anderson, 2003, Chapter 

6 ). 


4.3 Selecting L 


A leave-out-one cross-validation procedure akin to the cross-validation commonly used for other 
regularized Fisher’s discrimination procedures can be used for the data driven selection of L. Let 
\^jk] index of the classification rule for the kth. time series from Ilj using L cepstral coefficients 

when weight functions are estimated using the n — 1 time series that exclude this series. The cross- 
validation rule selects L = argmin^ X]/=i ^ ^ l ' 


4.4 Consistency 

Consistency of estimated discriminants, weight functions, and classification rules can be established 
under some assumptions as rij, N, and L increase. First, it is assumed that itj is y^-consistent, 
which holds under simple random sampling. 

Assumption 1. TTj = TTj + Op (n 1/2) , 

Appropriate estimators of nj based on the sampling scheme can be used if this does not hold. 
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We will assume the following two regularity conditions so that cepstral coefficients decay at a 


sufficient rate and that eighth moments of Xjkt exist. 

Assumption 2. < oo and Pr < oo^ = 1, j = 1,..., J. 

Assumption 3. sup;ygR E {|(iZjfc(A)|®} < oo and sup^gj^ E < oo, j = 1,..., J. 


Assumption 2 implies that mean and replicate-specific deviations of log-spectra are absolutely 
continuous with square integrable first derivatives and, consequently, error incurred by using only 
a finite number of cepstral coefficients is asymptotically negligible. The moment conditions in 
Assumption 3 imply that the eighth moments of Xjj^t are bounded. Our estimators depend on 
rand Ai, which are fourth-order functions of Xj^^, and these conditions assure their variances 
exist. Additionally, it is assumed that a log-spectral estimator is chosen that has asymptotically 
optimal bias (up to a constant), resolution, and bounded variance in the sense that the following 
assumption holds. 

Assumption 4. For j = 1, ..., J, sup^=o,...,LAf/ 2 j ^{ijkm - 7jA:(Am)} = i^i + for some 

Ui e M, sup^^p=o,...,LAf/2j Cov{^jkm “ ljk{\m),ljkp “ ljk{>^p)] = 0{N-^), and 
sup^= 0 ,...,LAr/ 2 j Var {7jfcm - 7ifc(Am)} = k '2 + 0{N~^) for some 1^2 > 0. 

Log-spectral estimators need only be asymptotically unbiased up to a constant since the discrimi¬ 
nant analysis is invariant to the addition of a constant. Assumption 4 holds for multitaper estimates, 
including tapered periodograms when R= 1. Whereas consistency results for multitaper estimates 
from a single realization require the number of tapers to grow with respect to the number of time 
points, we consider fixed R, since the variance reduction achieved by increasing the number of 
tapers is inherently achieved by projecting onto a finite cosine basis. 


Lemma 1. Assumption 4 holds for fjkm defined by (4.3) under Assumptions 2 and 3. 
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The proof of Lemma [^mirrors the proof of Theorem 1 from Krafty et al. (2011). It should be noted 
that, although this theorem is formulated for untapered periodograms, the properties underlying 


the Bartlett’s expansion used in the proof that are formulated in Janas and von Sachs (1995) hold 
for tapered data. 

The following theorem establishes consistency under these assumptions when the growth of L is 
restricted by the growth of n and N. The main obstacle in performing Fisher’s discriminant analysis 
when the dimension is large compared to the number of observations stems from the divergence 
of the eigenvalues of the within-group covariance and its inverse. The smoothness of log-spectra 
implies that the largest eigenvalue of the truncated within-group covariance matrix E | 


jk jk 

is bounded from above but that its smallest, which we will refer to as aL, approaches zero as L —>■ oo. 

, T' 


To assure that the inverse of E J consistently estimated by its sample version, 

we must assume that the number of replicates grows quickly compared to the number of coefficients 
by assuming that —)■ 0 and —)• 0. Replicated-specific cepstral coefficients are not 

observed but estimated from replicates of length N. To ensure that the error incurred by this 
estimation is asymptotically negligible, we must also limit the growth of the number of coefficients 
by the length of the time series such that —?• 0. 


Theorem 1. Under Assumptions 1-4, for every qth weight function yq, there exist a series of 
qth eigenvectors such that, if a —)■ 0, —)■ 0 and —>• 0 as 

n,N,L^oo, II (y^, 0,...) - yq| |^ 4 0. 

A direct consequence of the T-norm consistency of weight functions is that d^^^ —)• djkq and 

n(c;^) 4 n(c*). 
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5 Simulation 


We conducted a simulation study to investigate the empirical properties of classification proce¬ 
dures when within-group spectral variability exists. In these simulations, time series from J = 3 
groups are simulated as conditional autoregressive processes = (pjkiXjkt-i + 4 'jk 2 Xjkt -2 + ^jkt 
where ejkt are independent Gaussian white noise with conditional variance cr|^. Autoregressive 
parameters are drawn as independent uniform random variables where ~ Uni (0.05,0.7), 

ct>ik 2 ~ Uni (-0.12, -0.06), ~ Uni (0.01,1.2), ^ 2 k 2 ~ Uni (-0.36, -0.25), ~ Uni (0.12,1.5), 

and ^zk 2 ~ Uni (—0.75,—0.56). Three ranges for conditional innovation variances are explored 
where are drawn as independent uniform random variables over [0.1,10], [0.3, 3], and [0.9,1.1]. 
One thousand random samples are drawn for each of the 27 combinations of the three ranges of 
innovation variances, three numbers of time series per group in training data rij = 15, 50,100, 
and three time series lengths N = 250, 500,1000. Figure [^ displays simulated replicate-specific 
log-spectra when rij = 50 and (t|^ G [0.3,3]. A test data set of 50 time series per group is drawn 
for each random sample to evaluate out-of-sample classihcation rates. 

The proposed cepstral Fisher’s discriminant analysis was implemented using three log-spectral 
estimators: the multitaper estimator with R = 7, a direct estimator, and a smoothed estimator. 
Direct estimators were taken to be first tapered log-periodograms. Smoothed estimates were ob¬ 
tained by taking the logarithm after smoothing Ijkim across frequency via the modihed Daniell 
smoother with span selected through generalized cross-validation (Ombao et ah, 2001[ ). Addition¬ 
ally, two popular methods for the discriminant analysis of time series in the absence of within-group 
spectral variability were implemented using Kullback-Leibler and Chernoff information measures 
( ]Kakizawa et ah' , [1998 Shumway and Stoffer, 2011). Replicate-specihc spectra were individually 
estimated by smoothing periodograms with a modified Daniell smoother and span selected through 
generalized cross-validation. A test time series is classified into Ilj when the information measure 


18 























Group 1 


Group 2 


Group 3 





Figure 3: Replicate-specific log-spectra from simulated conditional AR(2) processes with rij = 50 
and e [0.3, 3]. 

distance between its smoothed periodogram and the average of the smoothed periodograms from 
the Ilj training data is smaller than its distance to the average of the smoothed periodograms from 
the training data from the other two groups. The tuning parameter for the Chernoff measure was 
selected using an appropriately modified version of the cross-validation procedure outline in Section 

Ol 

Table 1 displays the mean and standard deviation of the classification rates. In every setting, 
the cepstral Fisher’s discriminant analysis, using any of the three log-spectral estimators, had 
higher mean classification rates than the two information criterion based methods. Although the 
three cepstral Fisher’s discriminant procedures display comparatively similar performance, in each 
setting, the multitaper based method had the best classification rate and the direct method has 
the poorest. Changes in the amount of within-group spectral variability by changing the range of 
conditional innovation variances, which is the same for each group, do not affect the performance 
of the cepstral Fisher’s procedures. However, the classification rates of the procedures that ignore 


19 













Cepstral 

Cepstral 

Cepstral 


Kullback- 

^Jk 

Tlj 

N 

Multitaper 

Direct 

Smoothed 

Chernoff 

Leibler 

[0.1,10] 

15 

250 

91.1 (3.5) 

87.3 (3.7) 

89.4 (3.6) 

80.2 (4.9) 

79.2 (4.9) 



500 

94.0 (2.7) 

91.5 (3.4) 

93.2 (2.7) 

81.1 (4.8) 

80.1 (4.6) 



1000 

95.6 (2.8) 

94.1 (3.1) 

95.4 (2.7) 

81.4 (4.8) 

80.7 (4.7) 


50 

250 

93.1 (2.2) 

90.0 (2.6) 

91.8 (2.5) 

83.0 (3.5) 

81.9 (3.4) 



500 

95.5 (1.8) 

93.6 (2.2) 

94.9 (1.9) 

84.1 (3.8) 

82.8 (3.7) 



1000 

97.0 (1.7) 

95.9 (1.7) 

96.8 (1.6) 

84.7 (3.6) 

83.5 (3.6) 


100 

250 

93.8 (2.0) 

90.6 (2.4) 

92.3 (2.2) 

84.2 (3.3) 

82.8 (3.5) 



500 

95.7 (1.7) 

93.9 (2.0) 

95.2 (1.8) 

84.8 (3.2) 

83.5 (3.3) 



1000 

97.2 (1.5) 

96.0 (1.6) 

96.9 (1.5) 

85.3 (3.2) 

84.2 (3.5) 

[0.3,3] 

15 

250 

91.1 (3.1) 

86.8 (4.3) 

89.3 (3.7) 

82.7 (3.9) 

82.4 (4.0) 



500 

93.9 (3.0) 

91.6 (3.6) 

93.4 (3.0) 

83.7 (3.9) 

83.2 (4.0) 



1000 

95.5 (2.7) 

94.1 (2.9) 

95.4 (2.7) 

84.4 (4.1) 

83.9 (4.1) 


50 

250 

93.1 (2.1) 

89.8 (2.7) 

91.7 (2.3) 

84.9 (3.2) 

84.4 (3.4) 



500 

95.5 (1.8) 

93.7 (2.1) 

94.9 (1.9) 

85.8 (3.4) 

85.3 (3.5) 



1000 

97.0 (1.6) 

95.7 (1.8) 

96.8 (1.6) 

86.6 (3.0) 

85.9 (3.5) 


100 

250 

93.7 (2.0) 

90.5 (2.5) 

92.3 (2.2) 

85.4 (2.9) 

84.8 (3.0) 



500 

95.7 (1.7) 

94.0 (2.0) 

95.2 (1.8) 

86.5 (2.9) 

86.0 (3.1) 



1000 

97.3 (1.5) 

96.1 (1.7) 

97.0 (1.5) 

87.4 (2.8) 

86.8 (3.1) 

[0.9,1.1] 

15 

250 

91.2 (3.3) 

86.9 (4.0) 

89.6 (3.6) 

85.9 (3.2) 

85.7 (3.2) 



500 

93.9 (3.0) 

91.5 (3.4) 

93.4 (3.0) 

86.6 (3.2) 

86.3 (3.3) 



1000 

95.8 (2.6) 

94.4 (2.7) 

95.5 (2.6) 

87.6 (3.1) 

87.3 (3.2) 


50 

250 

93.2 (2.2) 

89.9 (2.6) 

91.8 (2.3) 

86.2 (3.0) 

86.1 (3.0) 



500 

95.5 (1.8) 

93.6 (2.2) 

95.0 (1.9) 

87.1 (2.8) 

86.6 (3.0) 



1000 

97.1 (1.6) 

95.7 (1.8) 

96.8 (1.6) 

88.1 (2.8) 

87.6 (3.0) 


100 

250 

93.6 (2.0) 

90.4 (2.3) 

92.3 (2.2) 

86.2 (2.9) 

85.8 (3.0) 



500 

96.0 (1.7) 

94.1 (2.0) 

95.4 (1.8) 

87.6 (2.6) 

87.0 (2.9) 



1000 

97.3 (1.4) 

96.1 (1.6) 

97.1 (1.5) 

88.4 (2.6) 

88.0 (2.9) 


Table 1: Mean (standard deviation) of the percent of correctly classified replicates. 
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1st Weight Function 


2nd Weight Function 




Figure 4: Estimated log-spectral weight functions from the gait analysis study, 
within-groups spectral variability have reduced classification rates when this variability increases. 


6 Analysis of Gait Variability 


Patterns of gait variability can provide insight into how neurological conditions affect the systems 


that regulate walking (Hausdorff et ah, 2000 Hausdorff, 2005). The discriminant analysis of gait 


variability from people with different neurological conditions can provide a tool for characterizing 
pathologies, aid in the diagnosis of neurodegenerative disease, and aid in the evaluation of treatment 
efficacy. In this section, we consider a discriminant analysis of gait variability from three groups of 


participants in the the study described by Hausdorff et al.l (2000): healthy controls, participants 


with amyotrophic lateral sclerosis, a disease characterized by a loss of motoneurons, and participants 
with Huntington’s disease, a pathology of the basal ganglia. These data can be obtained through 
PhysioNet ( jGoldberger et a~ 2000). 

In the study, participants were fitted with pressure sensors on the soles of their feet and told 
to walk at a normal pace. The information collected was used to compute stride intervals, or the 
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1st Discriminant 


Figure 5: Estimated discriminants from the gait analysis study: “C” denotes healthy controls, 
“A” denotes participants with amyotrophic lateral sclerosis, and “H” denotes participants with 
Huntington’s disease. 


elapsed time for each gait cycle. The present analysis considers the 3.5 minutes of stride intervals 
defined by the left foot after a 20 second start-up period. After a 3 standard deviation median filter 
was applied to remove artifacts associated with turning at the end of the hallway (]Hausdorff et al.f 


2000), cubic smoothing spline interpolants of the stride intervals as functions of time were sampled 


at 2 Hz and linear trends were removed. The resulting data are detrended stride interval series of 
length N = 420 from n = 45 participants: 16 healthy controls, 11 participants with amyotrophic 
lateral sclerosis, and 18 participants with Huntington’s disease. 

Log-spectra were estimated using R = 7 multitapers and cross-validation selected L = 4 co¬ 
efficients. Estimated log-spectral weight functions and I 2 are displayed in Figure To aid 
interpretability, we only display power spectra for frequencies between [0,0.5] Hz (]Henmi et al.| 


2009). 


The hrst log-spectral weight function is a contrast in power from frequencies greater than 0.08 
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Hz versus power from frequencies less than 0.08 Hz; large positive values for the first discriminant 
indicate larger relative power from higher frequencies. The second weight function is a contrast be¬ 
tween power from frequencies between 0.05-0.35 Hz versus power from frequencies greater than 0.35 
Hz; large positive values for the second discriminant indicate more power from lower frequencies. A 
scatter plot of the estimated discriminants djki, djk 2 is displayed in Figure]^ The first discriminant 
primarily separates participants with Huntington’s disease from the other two groups while the sec¬ 
ond discriminant primarily separates controls from participants with amyotrophic lateral sclerosis. 
This finding that groups are separated by contrasts in power from higher frequencies versus lower 
frequencies is not unexpected. Neurodegenerative diseases such as amyotrophic lateral sclerosis 
and Huntington’s disease affect systems that are important in maintaining a steady stride, thus 


introducing power at higher frequencies. [Hausdorff et al. (2000) found that, on average, positive 
values of the autocorrelation function decay slower for controls as compared to participants with 
Huntington’s disease, which is indicative of healthy controls possessing more power at lower fre¬ 
quencies, and that the average decay for participants with amyotrophic lateral sclerosis is between 
that of controls and participants with Huntington’s disease. 

Leave-out-one cross-validation was used to empirically assess the effectiveness of the classifica¬ 
tion rule. Fourteen of the 16 controls were correctly classihed, 1 misclassified as having amyotrophic 
lateral sclerosis, and 1 misclassified as having Huntington’s disease. Fifteen of the 18 participants 
with Huntington’s disease were correctly classified, 2 incorrectly classified as being controls, and 1 
misclassified as having amyotrophic lateral sclerosis. The rule did comparatively worse in classifying 
participants with amyotrophic lateral sclerosis: 5 were correctly classified, 4 classified as controls, 
and 2 classified as having Huntington’s disease. The Kullback-Leibler and Chernoff information 


measure classifiers of Kakizawa et al. (1998) were also implement and their performances evaluated 
though leave-out-one cross-validation. The two information measure classifiers had identical perfor- 
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mance. They correctly classified 11 of the 16 controls, 12 of the 18 participants with Huntington’s 
disease, and 5 of the 11 participants with amyotrophic lateral sclerosis. In addition to providing 
a more accurate classification rule, cepstral Fisher’s discriminant analysis produced interpretable 
estimated log-spectral weight functions and two-dimensional discriminant plot that can be used for 
illustration, which is lacking from the information measure classifiers. 

7 Discussion 

Traditional models for the frequency domain discriminant analysis of time series assume each series 
from a group are independently and identically distributed as models that are used in the spectral 
analysis of a single time series. However, such models fail to account for within-group spectral 
variability that is present in most real world applications. This article introduced the stochastic 
transfer function model that can account for this within-group variability, which is a product of 
long-range higher-order dependence. A cepstral based Fisher’s discriminant analysis is developed 
under this model. The procedure provides parsimonious low-dimensional measures that illuminate 
scientihc mechanisms that best separate groups and, when within-group spectral variability is 
present, provides more accurate classification of new observations as compared to methods that do 
not account for within-group spectral variability. 

The cepstral Fisher’s discriminant analysis was presented in this article under the assumption 
of equal within-group spectral covariance functions. If these covariance functions differ such that 
= E (bjkebjkm) depends on j, the discriminants can be defined by using F = 

Under this setting, if group membership is viewed as a random variable such that a randomly 
selected time series has probability ttj from being in group j, discriminants have the interpretation 
of being linear functions that maximize the variance of the conditional expected value relative to 
the expected value of the conditional variance, where conditioning is taken with respect to group 
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membership. The proposed estimator, which already utilizes the estimate Fl = is 

still valid in this setting and the consistency results established in Theorem 1 hold. The only 
statements made in this article that need to be adjusted when log-spectral covariances depends on 
group are those that are made with respect to classification rates. The optimal classification rule 
under normality in this settings will be quadratic and, while providing some gain in classification 
accuracy, fails to address the discriminant analysis question or provide interpretable parsimonious 
measures, such as the estimated weight functions and discriminants in Figures and Addition¬ 
ally, although the theoretically optimal classifier is quadratic, Fisher’s classification procedures are 


robust to heteroscedasticity (O’Neill, 1992) and outperform quadratic procedures in practice for 


high-dimensional data where parameters must be estimated (Cheng 2004). 


Only univariate time series were considered in this article while the information criteria based 


methods of Kakizawa et al. (1998) are applicable to vector-valued time series. Although the stochas¬ 


tic transfer function model for time series in the presence of within-group spectral variability can 
be extended to the vector-valued setting in a straight forward manner, due to the non-convexity 


of the matrix exponential discussed in Section 6 of Krafty and Collinge (2013), multivariate ana¬ 


logues of cepstral coefficients have yet to be developed. Consequently, the extension of the proposed 
discriminant analysis procedure to the vector-valued setting is not straight forward. A cepstral dis¬ 
criminant analysis could be conducted on the collection of component-specific cepstral coefficients, 
however such an analysis ignores coherence between different components. The extension of the 
proposed procedure to the vector-valued setting will be the focus of future work. 
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Appendix: Proofs 


Proof of Theorem 1 

To prove Theorem 1, we will make use of two lemmas. Lemma whose proof is provided in 
the subsequent subsection, formalizes the approximation of the true infinite dimensional weight 
functions from the weight functions obtained from truncated true cepstra = {cjko, ■ ■ ■, CjkL-i)'^, 
while Lemma follows directly from Assumptions 2-4 and the orthonormality of the cosine series. 

Lemma 2. Under Assumption 2, for a qth cepstral weight function yq, there exists a series of qth 
weight functions for the Fisher’s discriminant analysis ofc^^ such that 11 (y^ , 0,...) — yq| —)• 0 
as L —>■ oo 


Lemma 3. Under Assumptions 2-4, as N ^ oo, E {cjko — cjko — = O (^N and 

sup£=i,...,LAr/2j E {cjke - Cjkef = O (iV"^). 

Define af = (a^o, • ■ •, = {bjko, ■ ■ ■ ,bjkL-if, |, and = 

where a^ = Ylj=i Further, let || • || be the operator norm on 

Lx L matrices such that, for an Lx L matrix A, ||A|| = supyTy^^ (y^A^Ay). Note that showing 
I if K.I — r^^A^II A 0 implies that Yq ^ Yq, and since Lemma established that Yq converges 
to yq in 11'I Ip, completes the proof of Theorem 1. To ease exposition, we simplify the notation in 
this proof by suppressing the dependence of parameters and estimates on L. From Cauchy-Schwarz 
and basic matrix algebra, it follows that 


f ^A-r~^A 

< 

r-Ut-T)t ^a-t-Ua-a) 



\ J \ J 


< a 


-1 


r-r 


^-1 


+ CT 


-1 


A-A 


(A.l) 
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To investigate the decay of | jF—r| | < I “^11) note that, since the operator norm is dom¬ 
inated by the Frobenius norm, E| ifj —r| p — FromLemmaj^ and the 

definition of f letting tjc = supj E\bjk£\^, there exists a C > 0 such that E |f j(t', m) — r(£, m)| < 
(n"i soE||fj-r|p < C (j2e=o V^e) Since Pr l^jkef < oo 

1 and E < oo, % = 0{£~^) and limi^oo (^X^£=o V^i') ^ then follows that Hf — r|| = 

Op(n-V2) +Op(Ar-i/2). 


It follows from similar arguments to those used above that ||A —A|| = Op (n +Op (N 
Since the approximation of the finite between-group kernel is stable and the inhnite dimensional 


kernel is bounded, ||A|| = Op(l) (Chatelin, 1981, Proposition 2.2), and consequently, ||A|| = Op(l}. 

To investigate the decay of ||r ||, let d be the smallest eigenvalue of F so that ||r || = 

Define the matrix F = let a be its smallest eigenvalue, and note that 

It follows from Lemma I and from |Bai and Yi^ ( |l993[ ) that d = 1 -|- Op{Ln -|- Op(N and 
consequently when N ^ oo and Ln~^/‘^ —^ 0, Hf ^|| = a~^ + Op{a~^Ln~^) -|- 
Plugging these results for ||r — F||, ||A — A||, ||A|| and ||r || into Equation 


A.l 


we conclude 




that ||f A —F ^h.\\ = Op{a ^/‘^) + Op{a “^N ^/‘^) + Op{a “^Ln which, when u 0, 

—)• 0 and —)■ 0, decays to zero. 


Proof of Lemma 2 


For an M^-valued random variable g with covariance kernel F, let L‘^{g) represent the Hilbert space 
spanned by linear functions of g with inner product Ylm=o '^mgm)g = Y,T,m=o m) 

{w,v)y- Further, let (•, be the inner product of the reproducing kernel Hilbert space 'H(r) that 


has reproducing kernel F. As discussed in Aronszajn (1950), there exists a congruence between T-LiT) 


and L‘^{g) defined through the linear map T {F(£, •)} = g^. Define the operator T : 'H(F) —)• 'H(F) 
as {Th)i = (A(£, ■),h)g{, h G L^(F). The operator T is self-adjoint, positive and compact so that it 
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possess a spectral decomposition T = ® where Tq are nonincreasing eigenvalues and 


9q are orthonormal eigenfunctions in H (F) (Shin 


2008 


Theorem 2.4). We will also consider as 


the minimum T-L(T) 0 'H{^) norm interpolate to Al on (0,..., L — 1) x (0,..., L — 1) and its asso¬ 
ciated operator : 'H{T) —)> ?^(r) defined as = {Ai^{i, ■),h)'^. Let be the nonincreasing 

set of eigenvalues of the operator Tl and 0^ be a set of associated eigenfunctions. The following 


lemma, whose proof mirrors the proof of Lemma 4 in Eubank and Hsing (2008) that deals with 


approximating the operator of a canonical correlation analysis, relates the spectral decomposition 
of Tl to the Fisher’s discriminant analysis of 

Lemma 4. For every qth eigenfunction 6^ of Tl, there exists a qth eigenfunction ofVf^AL 
such that \ \^{6^) - J2i=o VgedeWg = 0- 

A consequence of Lemma and the L‘^{g) equivalence between T {6q) and Yl’^oVqidi is that 
the proof of Lemma 2 can be completed by showing the existence of qih eigenfunctions 6^ of Tl 
such that 9q ^ 9q vaFLiT). 

By Cauchy-Schwarz, || (T - Tl)< \\A - ALW-H^vWhW-H for all h G ^{T) where || • \\\u®'H 
is the tensor product norm. The decay of cepstral coefficients under Assumption 2 implies that 
II A — Al\\'h®'H 0 and consequently, Tl converges to T in operator norm. This convergence, 

when combined with the property that Tl is stable, implies that the set of unique eigenvalues of 

Proposition 2.2). Further, if 


Tl converges to the set of unique eigenvalues of T (Chatelin 


1981 


is a series of eigenvalues that converge to Tq, the multiplicity of Tq is always greater than or equal 


to the multiplicity of t„ (Chatelin 


1981 


Lemma 2.1). Since there exists a Lq such that the rank 


of Tl is equal to Q for all L> Lq, for large L, the multiplicity of Tq is the same as Tq. Proposition 


2.3 (iii) of Chatelin (1981) can then be applied to conclude that there exists a set of eigenfunctions 


9q of Tl such that 9q ^ 9q. 
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